对计算的需求仍在呈指数增长。这种增长将转化为计算能源消耗的指数增长,除非其能源效率的提高可以超过其需求增加。然而,经过数十年的研究,由于已经进行了高度优化,因此进一步提高能源效率变得越来越具有挑战性。结果,在某个时候,计算需求的增加可能会超过其能源效率的增加,这可能会大大增加。这种指数增长(如果不受组织)将把计算定位为全球碳排放的重要贡献者。尽管著名的技术公司已经意识到了这一问题并试图减少其碳排放,但可以理解的是,他们的成功是可以无意间传达出现在或很快就会解决问题的错误印象的潜力。如果这种错误的印象有助于阻止在这一领域进行进一步研究,因为我们讨论了消除计算机,而且更普遍地社会的碳排放远非解决问题。为了更好地理解问题的范围,本文提炼了决定计算的碳足迹及其对实现可持续计算的影响的基本趋势。
translated by 谷歌翻译
Self-supervised pre-trained transformers have improved the state of the art on a variety of speech tasks. Due to the quadratic time and space complexity of self-attention, they usually operate at the level of relatively short (e.g., utterance) segments. In this paper, we study the use of context, i.e., surrounding segments, during fine-tuning and propose a new approach called context-aware fine-tuning. We attach a context module on top of the last layer of a pre-trained model to encode the whole segment into a context embedding vector which is then used as an additional feature for the final prediction. During the fine-tuning stage, we introduce an auxiliary loss that encourages this context embedding vector to be similar to context vectors of surrounding segments. This allows the model to make predictions without access to these surrounding segments at inference time and requires only a tiny overhead compared to standard fine-tuned models. We evaluate the proposed approach using the SLUE and Librilight benchmarks for several downstream tasks: Automatic speech recognition (ASR), named entity recognition (NER), and sentiment analysis (SA). The results show that context-aware fine-tuning not only outperforms a standard fine-tuning baseline but also rivals a strong context injection baseline that uses neighboring speech segments during inference.
translated by 谷歌翻译
Concept bottleneck models (CBMs) (Koh et al. 2020) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions.We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction. We demonstrate thata simple policy combining concept prediction uncertainty and influence of the concept on the final prediction achieves strong performance and outperforms a static approach proposed in Koh et al. (2020) as well as active feature acquisition methods proposed in the literature. We show that the interactiveCBM can achieve accuracy gains of 5-10% with only 5 interactions over competitive baselines on the Caltech-UCSDBirds, CheXpert and OAI datasets.
translated by 谷歌翻译
Selective classification involves identifying the subset of test samples that a model can classify with high accuracy, and is important for applications such as automated medical diagnosis. We argue that this capability of identifying uncertain samples is valuable for training classifiers as well, with the aim of building more accurate classifiers. We unify these dual roles by training a single auxiliary meta-network to output an importance weight as a function of the instance. This measure is used at train time to reweight training data, and at test-time to rank test instances for selective classification. A second, key component of our proposal is the meta-objective of minimizing dropout variance (the variance of classifier output when subjected to random weight dropout) for training the metanetwork. We train the classifier together with its metanetwork using a nested objective of minimizing classifier loss on training data and meta-loss on a separate meta-training dataset. We outperform current state-of-the-art on selective classification by substantial margins--for instance, upto 1.9% AUC and 2% accuracy on a real-world diabetic retinopathy dataset. Finally, our meta-learning framework extends naturally to unsupervised domain adaptation, given our unsupervised variance minimization meta-objective. We show cumulative absolute gains of 3.4% / 3.3% accuracy and AUC over the other baselines in domain shift settings on the Retinopathy dataset using unsupervised domain adaptation.
translated by 谷歌翻译
Many real-world learning scenarios face the challenge of slow concept drift, where data distributions change gradually over time. In this setting, we pose the problem of learning temporally sensitive importance weights for training data, in order to optimize predictive accuracy. We propose a class of temporal reweighting functions that can capture multiple timescales of change in the data, as well as instance-specific characteristics. We formulate a bi-level optimization criterion, and an associated meta-learning algorithm, by which these weights can be learned. In particular, our formulation trains an auxiliary network to output weights as a function of training instances, thereby compactly representing the instance weights. We validate our temporal reweighting scheme on a large real-world dataset of 39M images spread over a 9 year period. Our extensive experiments demonstrate the necessity of instance-based temporal reweighting in the dataset, and achieve significant improvements to classical batch-learning approaches. Further, our proposal easily generalizes to a streaming setting and shows significant gains compared to recent continual learning methods.
translated by 谷歌翻译
Decentralized bilevel optimization has received increasing attention recently due to its foundational role in many emerging multi-agent learning paradigms (e.g., multi-agent meta-learning and multi-agent reinforcement learning) over peer-to-peer edge networks. However, to work with the limited computation and communication capabilities of edge networks, a major challenge in developing decentralized bilevel optimization techniques is to lower sample and communication complexities. This motivates us to develop a new decentralized bilevel optimization called DIAMOND (decentralized single-timescale stochastic approximation with momentum and gradient-tracking). The contributions of this paper are as follows: i) our DIAMOND algorithm adopts a single-loop structure rather than following the natural double-loop structure of bilevel optimization, which offers low computation and implementation complexity; ii) compared to existing approaches, the DIAMOND algorithm does not require any full gradient evaluations, which further reduces both sample and computational complexities; iii) through a careful integration of momentum information and gradient tracking techniques, we show that the DIAMOND algorithm enjoys $\mathcal{O}(\epsilon^{-3/2})$ in sample and communication complexities for achieving an $\epsilon$-stationary solution, both of which are independent of the dataset sizes and significantly outperform existing works. Extensive experiments also verify our theoretical findings.
translated by 谷歌翻译
我们假设现有的句子级机器翻译(MT)指标在人类参考包含歧义时会效率降低。为了验证这一假设,我们提出了一种非常简单的方法,用于扩展预审计的指标以在文档级别合并上下文。我们将我们的方法应用于三个流行的指标,即Bertscore,Prism和Comet,以及无参考的公制Comet-QE。我们使用提供的MQM注释评估WMT 2021指标共享任务的扩展指标。我们的结果表明,扩展指标的表现在约85%的测试条件下优于其句子级别的级别,而在排除低质量人类参考的结果时。此外,我们表明我们的文档级扩展大大提高了其对话语现象任务的准确性,从而优于专用基线高达6.1%。我们的实验结果支持我们的初始假设,并表明对指标的简单扩展使他们能够利用上下文来解决参考中的歧义。
translated by 谷歌翻译
姿势图优化是同时定位和映射问题的一种特殊情况,其中唯一要估计的变量是姿势变量,而唯一的测量值是施加间约束。绝大多数PGO技术都是基于顶点的(变量是机器人姿势),但是最近的工作以相对方式参数化了姿势图优化问题(变量是姿势之间的变换),利用最小循环基础来最大程度地提高范围的稀疏性。问题。我们以增量方式探索周期基础的构建,同时最大程度地提高稀疏性。我们验证一种算法,该算法逐渐构建稀疏循环基础,并将其性能与最小循环基础进行比较。此外,我们提出了一种算法,以近似两个图表的最小周期基础,这些图在多代理方案中常见。最后,姿势图优化的相对参数化仅限于使用SE(2)或SE(3)上的刚体变换作为姿势之间的约束。我们引入了一种方法,以允许在相对姿势图优化问题中使用低度测量值。我们对标准基准,模拟数据集和自定义硬件的算法进行了广泛的验证。
translated by 谷歌翻译
Twenty20板球,有时是二十20,经常缩写为T20,是板球的一小部分。在一场二十二十比赛中,两支球员组成的两支球队都有一局,最多仅限20分。这个版本的板球尤其是不可预测的,这是它最近在近期越来越受欢迎的原因之一。但是,在本文中,我们尝试了四种不同的方法来预测T20板球比赛的结果。具体来说,我们要考虑:以前的竞争团队参与者的绩效统计数据,从知名的板球统计网站获得的球员的评分,以相似的性能统计数据和基于ELO基于ELO的方法来汇率玩家。我们通过使用逻辑回归,支持向量机,贝叶斯网络,决策树,随机森林来比较每种方法的性能。
translated by 谷歌翻译
我们介绍了一种解释各种线性和分层多标准决策(MCDM)技术(例如WSM和AHP)的结果。这两个关键思想是(a)维持这些技术操纵的值的细粒度表示,以及(b)通过合并,过滤和汇总操作从这些表示形式中得出解释。我们模型中的一个解释对MCDM问题中的两种替代方案进行了高级比较,大概是一个最佳和非最佳选择,这阐明了为什么一种选择比另一个选择更优于另一个替代方案。我们通过为MCDM文献中的两个众所周知的示例生成解释来展示我们的技术的有用性。最后,我们通过执行计算实验来显示它们的功效。
translated by 谷歌翻译